4.6 Cochran’s Q Test for Comparing the Performance of Multiple Classifiers

(Cochran’s Q test) only tells us that there is a difference among the models.

検定統計量Q

自由度M-1のχ2乗分布を近似する

Mは評価したいモデルの数

McNemar検定ではM=2（1自由度のχ2乗分布を近似していた）

Cochran’s Q test tests the null hypothesis (H0) that there is no difference between the classification accuracies

「帰無仮説 H0 は、分類accuracyに違いはない」

Let {C1, . . . , CM } be a set of classifiers who have all been tested on the same dataset.

M個の分類器

同一のデータについてテストされた

Qの計算式

G_i：テストサンプルn個のうち、M個ある分類器C_iそれぞれによって正しく分類された数

M_j：M個の分類器のうち、テストセットのj番目のサンプルを正しく分類した分類器の数

（つまり、M以下の数値）

G_iはM_jの総和（p.39 一番下の式） iが出てこない？

T：M個の分類器にわたる正しい投票（？）数の合計

the total number of correct number of votes among the M classifiers

Kuncheva 2004 『Combining Pattern Classifiers: Methods and Algorithms』

後述の例も参照（式(51)）

we typically organize the classifier predictions in a binary n × M matrix (number of test examples vs. the number of classifiers)

「分類器の予測をバイナリなn × M行列に整える必要がある」

n: テストサンプル数

M: 分類器の数

行列のij要素が0

テストサンプルx_iを分類器C_jは誤って分類した

行列のij要素が1

テストサンプルx_iを分類器C_jは正しく分類した

『Combining Pattern Classifiers: Methods and Algorithms』の例

y_trueは全て正しいので全て0

3つの分類器 C_1, C_2, C_3（Table 2）

0が正しく、1が誤り

Tは各分類器が正しく推論した数＝ここでは0の数＝84+92+92＝268

CochranのQ検定の実装：cochrans_q: Cochran's Q test for comparing multiple classifiers

例では帰無仮説が棄却された。事後検定（ペア単位）へ

Bonferroni修正したMcNemar検定

ただし多数の比較

Peter H. Westfall, James F. Troendl, and Gene Pennello wrote a nice article on how to approach such situations where we want to compare multiple models to each other

Westfall et al. 2010 （積ん読）Multiple McNemar tests 参照

no free lunch theorem of statistical tests

in practice, if we are honest and rigorous, the process of multiple hypothesis testing with appropriate corrections can be a useful aid in decision making

（統計的検定における「何もせずに何かを得られることはない」はあるけれども）「現実には、私たちが誠実で厳格ならば、適切な修正を伴う複数仮説検定の過程は意思決定において有用な助けとなる」